[AHM] Prepare `election-provider-multi-block` for full lazy data deletion by kianenigma · Pull Request #8304 · paritytech/polkadot-sdk

kianenigma · 2025-04-23T09:36:31Z

Follow-up to: #8127

This PR makes all the hefty storage items in multi-block and multi-block::verifier to be keyed by a round index (note that the same has already been done for multi-block::signed).

This allows us to stop deleting all the data upon a round ending, and instead simply know that it is stale, and just delete it later via #[pallet::task], or a free extrinsic.

As per #8127, the worst PoV weight of this pallet with the proposed configurations is around 4Mb uncompressed. With 10Mb PoV, this will likely be okay for now. Therefore, this PR is adding the infrastructure for lazy deletion, yet it is not activating it.

I am raising this in advance because

It will prevent a data migration in the past. Migrating a Map to DoubleMap and so on in transit is a pain
The polkadot-staking-miner better adapt itself already to read these storage items in the new format

A follow-up may replace OnRoundRotation with (), and add the transactions/#[pallet::task] needed to do the deletion lazily.

cc @niklasad1 @sigurpol for review and a 👍 that this will be incorporated in the miner.

Reflection

In the past months, I have been working on two pallets:

pallet-staking-async, a lot of code for which I have written in 2019/2020, with little experience in FRAME.
pallet-election-provider-multi-block, which I started writing in 2021, and resumed in 2025 (yes, 4 years later :D). But crucially, I can acknowledge that by this time I had a much better understanding of how to write FRAME pallets in a safe and extensible way.

And nowadays, I can feel the difference very clearly. Adding anything in pallet-staking-async seems unsafe, a million test cases break, but they are mostly false negatives, leading to uncertainty and more likelihood of missing the few that are actually false-positives. The code has patchy structure, and has a lot of assumptions that are, perhaps written down somewhere but are not particularly enforced.

Contrary, adding a new feature to pallet-election-provider-multi-block, like this PR does, feels like a smooth ride. I know the code has many checks in place, it will prevent me from making mistakes, the APIs are much clearer, and in general I have very little doubt of something breaking.

And I think this boils down to two practices that I deployed in the latter, but didn't have the expertise to do in the former:

Composite storage items: Almost any complex FRAME pallet has a number of storage items that are related, and it helps A TON if you put them behind a type the does the reads and writes together, and asserts all invariants on the spot via a fn mutate_checked and/or asserting it in a fn try_state that is executed in all unit tests.
I spent almost as much time and energy in developing the mock.rs as I did in developing the core pallet logic. Having a great test setup will definitely help. I often see developers giving very little attention to their mock.rs, it being a copy and paste of the neighboring pallet, and I find this to be counter-productive: It saves you time today, but it will have indefinite cumulative burden on you and others in the future.

I wrote more about all of these in a very old forum post: https://forum.polkadot.network/t/testing-complex-frame-pallets-discussion-tools/356#composite-semi-private-storage-types-3

sigurpol · 2025-04-24T07:57:27Z

I will let @niklasad1 to proper comment on the actual changes 😅
If I get the main impact for the miner, I am fine in adapting the miner to handle key storage items by round index.

I am wondering what the impact for the clearing submission strategy is and if we still need to reclaim deposit explicitly with this new approach or if it is something that the chain can do for the miner instead.

Is the one below a fair summary for miner's Point of View?

Current behavior:

The staking miner needs to be careful about previous submissions potentially interfering with new ones during round transitions. When a new election round starts, the miner might need to explicitly clear or withdraw previous solutions (see paritytech/polkadot-staking-miner#1029).

New behavior with round-indexed storage:

Automatic round separation: Since all storage items are now keyed by round index, the chain automatically separates solutions from different rounds
No manual clearing needed: The miner no longer needs to explicitly clear previous submissions - they're automatically isolated in their own round-specific storage so potentially could give the deposit back without manual intervention of the miner. [I am really not sure if I get this point right!!!!]
Race condition prevention: Even if the miner is submitting a solution for round N while round N+1 starts, there's no conflict because they're stored separately

Or do I get it wrong, @kianenigma ?

niklasad1 · 2025-04-24T08:39:54Z

This shouldn't be an issue for the staking-miner rather than that we have to update the metadata and it only affects that multi-block verifier miner doesn't have to clean up old data?

Maybe worse db migrations etc

You got my 👍

kianenigma · 2025-04-24T10:29:46Z

I am wondering what the impact for the clearing submission strategy is and if we still need to reclaim deposit explicitly with this new approach or if it is something that the chain can do for the miner instead.

The signed phase hasn't really changed in this PR -- the deletion of data was lazy, and it will remain as it was before.

No manual clearing needed: The miner no longer needs to explicitly clear previous submissions - they're automatically isolated in their own round-specific storage so potentially could give the deposit back without manual intervention of the miner. [I am really not sure if I get this point right!!!!]

So yea, this is not a correct statement.

The old signed submissions are already key-ed by round, and they can be deleted later. Also note that you don't have to do it asap, you will just get your deposit back when you clear it.

sigurpol · 2025-04-24T10:33:46Z

I am wondering what the impact for the clearing submission strategy is and if we still need to reclaim deposit explicitly with this new approach or if it is something that the chain can do for the miner instead.

The signed phase hasn't really changed in this PR -- the deletion of data was lazy, and it will remain as it was before.

No manual clearing needed: The miner no longer needs to explicitly clear previous submissions - they're automatically isolated in their own round-specific storage so potentially could give the deposit back without manual intervention of the miner. [I am really not sure if I get this point right!!!!]

So yea, this is not a correct statement.

The old signed submissions are already key-ed by round, and they can be deleted later. Also note that you don't have to do it asap, you will just get your deposit back when you clear it.

Great, thank you, makes sense!

sigurpol · 2025-04-25T08:34:58Z

No manual clearing needed: The miner no longer needs to explicitly clear previous submissions - they're automatically isolated in their own round-specific storage so potentially could give the deposit back without manual intervention of the miner. [I am really not sure if I get this point right!!!!]

So yea, this is not a correct statement.

The old signed submissions are already key-ed by round, and they can be deleted later. Also note that you don't have to do it asap, you will just get your deposit back when you clear it.

@kianenigma , got it. Not for now but in a distant future, what does prevent us to move the responsibility to clear / reclaim the deposit from the miner to the chain / pallet ? In theory when you do the lazy deletion and everything, you will have all the information about discarded solutions or not?

kianenigma · 2025-04-25T09:06:10Z

miner to the chain / pallet

Yeah we could, but the problem is that when the pallet logic force-deletes everything all at once, you end up with a lot of storage operations in a single block. The blockchain programming model is kinda uncomfortable here, but we have some tools for it:

on_idle hook
pallet::tasks system
and the most un-opinionated one, the chain just does nothing, but has some transactions that do the cleanup, and if done correctly, they are free

`desired_targets` storage item now requires the round index as a parameter, so this change correctly aligns our code with the updated storage API in the Polkadot SDK. See [SDK PR](paritytech/polkadot-sdk#8304).

* Update multiblock metadata (tracking SDK rev 8da42e8fae) `desired_targets` storage item now requires the round index as a parameter, so this change correctly aligns our code with the updated storage API in the Polkadot SDK. See [SDK PR](paritytech/polkadot-sdk#8304). * Handle round parameter in snapshot functions [SDK #8304](https://github.com/paritytech/polkadot-sdk/pull/8304/files) has introduced a breaking change for which also snapshot methods are keyed on round index. Update the code accordingly. * Fix fmt

@niklasad1

…tion (paritytech#8304) Follow-up to: paritytech#8127 This PR makes all the hefty storage items in `multi-block` and `multi-block::verifier` to be keyed by a round index (note that the same has already been done for `multi-block::signed`). This allows us to stop deleting all the data upon a round ending, and instead simply know that it is stale, and just delete it later via `#[pallet::task]`, or a free extrinsic. As per paritytech#8127, the worst PoV weight of this pallet with the proposed configurations is around 4Mb uncompressed. With 10Mb PoV, this will likely be okay for now. Therefore, this PR is adding the infrastructure for lazy deletion, yet it is not activating it. I am raising this in advance because 1. It will prevent a data migration in the past. Migrating a Map to `DoubleMap` and so on in transit is a pain 2. The `polkadot-staking-miner` better adapt itself already to read these storage items in the new format A follow-up may replace `OnRoundRotation` with `()`, and add the transactions/`#[pallet::task]` needed to do the deletion lazily. cc @niklasad1 @sigurpol for review and a 👍 that this will be incorporated in the miner. --- ## Reflection In the past months, I have been working on two pallets: 1. `pallet-staking-async`, a lot of code for which I have written in 2019/2020, with little experience in FRAME. 2. `pallet-election-provider-multi-block`, which I started writing in 2021, and resumed in 2025 (yes, 4 years later :D). But crucially, I can acknowledge that by this time I had a much better understanding of how to write FRAME pallets in a safe and extensible way. And nowadays, I can feel the difference very clearly. Adding anything in `pallet-staking-async` seems unsafe, a million test cases break, but they are mostly false negatives, leading to uncertainty and more likelihood of missing the few that are actually false-positives. The code has patchy structure, and has a lot of assumptions that are, perhaps written down _somewhere_ but are not particularly enforced. Contrary, adding a new feature to `pallet-election-provider-multi-block`, like this PR does, feels like a smooth ride. I know the code has many checks in place, it will prevent me from making mistakes, the APIs are much clearer, and in general I have very little doubt of something breaking. And I think this boils down to **two practices** that I deployed in the latter, but didn't have the expertise to do in the former: 1. **Composite storage items**: Almost any complex FRAME pallet has a number of storage items that are related, and it helps A TON if you put them behind a type the does the reads and writes together, and asserts all invariants on the spot via a `fn mutate_checked` and/or asserting it in a `fn try_state` that is executed in all unit tests. 2. I spent almost as much time and energy in developing the `mock.rs` as I did in developing the core pallet logic. Having a great test setup will definitely help. I often see developers giving very little attention to their `mock.rs`, it being a copy and paste of the neighboring pallet, and I find this to be counter-productive: It saves you time today, but it will have indefinite cumulative burden on you and others in the future. I wrote more about all of these in a very old forum post: https://forum.polkadot.network/t/testing-complex-frame-pallets-discussion-tools/356#composite-semi-private-storage-types-3

* master: (120 commits) [CI] Improve GH build status checking (#8331) [CI/CD] Use original PR name in prdoc check for the backport PR's to the stable branches (#8329) Add new host APIs set_storage_or_clear and get_storage_or_zero (#7857) push to dockerhub (#8322) Snowbridge - V1 - Adds 2 hop transfer to Rococo (#7956) [AHM] Prepare `election-provider-multi-block` for full lazy data deletion (#8304) Check umbrella version (#8250) [AHM] Fully bound staking async (#8303) migrate parachain-templates tests to `gha` (#8226) staking-async: add missing new_session_genesis (#8310) New NFT traits: granular and abstract interface (#5620) Extract create_pool_with_native_on macro to common crate (#8289) XCMP: use batching when enqueuing inbound messages (#8021) Snowbridge - Tests refactor (#8014) Allow configuration of worst case buy execution weight (#7944) Fix faulty pre-upgrade migration check in pallet-session (#8294) [pallet-revive] add get_storage_var_key for variable-sized keys (#8274) add poke_deposit extrinsic to pallet-recovery (#7882) `txpool`: use tracing for structured logging (#8001) [revive] eth-rpc refactoring (#8148) ...

@niklasad1

…tion (#8304) Follow-up to: #8127 This PR makes all the hefty storage items in `multi-block` and `multi-block::verifier` to be keyed by a round index (note that the same has already been done for `multi-block::signed`). This allows us to stop deleting all the data upon a round ending, and instead simply know that it is stale, and just delete it later via `#[pallet::task]`, or a free extrinsic. As per #8127, the worst PoV weight of this pallet with the proposed configurations is around 4Mb uncompressed. With 10Mb PoV, this will likely be okay for now. Therefore, this PR is adding the infrastructure for lazy deletion, yet it is not activating it. I am raising this in advance because 1. It will prevent a data migration in the past. Migrating a Map to `DoubleMap` and so on in transit is a pain 2. The `polkadot-staking-miner` better adapt itself already to read these storage items in the new format A follow-up may replace `OnRoundRotation` with `()`, and add the transactions/`#[pallet::task]` needed to do the deletion lazily. cc @niklasad1 @sigurpol for review and a 👍 that this will be incorporated in the miner. --- ## Reflection In the past months, I have been working on two pallets: 1. `pallet-staking-async`, a lot of code for which I have written in 2019/2020, with little experience in FRAME. 2. `pallet-election-provider-multi-block`, which I started writing in 2021, and resumed in 2025 (yes, 4 years later :D). But crucially, I can acknowledge that by this time I had a much better understanding of how to write FRAME pallets in a safe and extensible way. And nowadays, I can feel the difference very clearly. Adding anything in `pallet-staking-async` seems unsafe, a million test cases break, but they are mostly false negatives, leading to uncertainty and more likelihood of missing the few that are actually false-positives. The code has patchy structure, and has a lot of assumptions that are, perhaps written down _somewhere_ but are not particularly enforced. Contrary, adding a new feature to `pallet-election-provider-multi-block`, like this PR does, feels like a smooth ride. I know the code has many checks in place, it will prevent me from making mistakes, the APIs are much clearer, and in general I have very little doubt of something breaking. And I think this boils down to **two practices** that I deployed in the latter, but didn't have the expertise to do in the former: 1. **Composite storage items**: Almost any complex FRAME pallet has a number of storage items that are related, and it helps A TON if you put them behind a type the does the reads and writes together, and asserts all invariants on the spot via a `fn mutate_checked` and/or asserting it in a `fn try_state` that is executed in all unit tests. 2. I spent almost as much time and energy in developing the `mock.rs` as I did in developing the core pallet logic. Having a great test setup will definitely help. I often see developers giving very little attention to their `mock.rs`, it being a copy and paste of the neighboring pallet, and I find this to be counter-productive: It saves you time today, but it will have indefinite cumulative burden on you and others in the future. I wrote more about all of these in a very old forum post: https://forum.polkadot.network/t/testing-complex-frame-pallets-discussion-tools/356#composite-semi-private-storage-types-3

* Update multiblock metadata (tracking SDK rev 8da42e8fae) `desired_targets` storage item now requires the round index as a parameter, so this change correctly aligns our code with the updated storage API in the Polkadot SDK. See [SDK PR](paritytech/polkadot-sdk#8304). * Handle round parameter in snapshot functions [SDK #8304](https://github.com/paritytech/polkadot-sdk/pull/8304/files) has introduced a breaking change for which also snapshot methods are keyed on round index. Update the code accordingly. * Fix fmt

* tests: add e2e tests for multi-block miner * tweak config files for gha * fix fmt * Fix snaphot fn after metadata update (#1037) * Update multiblock metadata (tracking SDK rev 8da42e8fae) `desired_targets` storage item now requires the round index as a parameter, so this change correctly aligns our code with the updated storage API in the Polkadot SDK. See [SDK PR](paritytech/polkadot-sdk#8304). * Handle round parameter in snapshot functions [SDK #8304](https://github.com/paritytech/polkadot-sdk/pull/8304/files) has introduced a breaking change for which also snapshot methods are keyed on round index. Update the code accordingly. * Fix fmt * chore(deps): bump actions/download-artifact from 4.2.1 to 4.3.0 (#1039) Bumps [actions/download-artifact](https://github.com/actions/download-artifact) from 4.2.1 to 4.3.0. - [Release notes](https://github.com/actions/download-artifact/releases) - [Commits](actions/download-artifact@v4.2.1...v4.3.0) --- updated-dependencies: - dependency-name: actions/download-artifact dependency-version: 4.3.0 dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Do not build omni-node * Check RPC port for the parachain collator before running the miner Do not rely on specific text in polkadot logs but on the parachain collator to be up and running on the RPC port specified on the zombienet configuration. * Fix wrong page index while checking page submissions * Test successful <=> users is rewarded after score and all pages are submitted * Cleanup * Improve error handling * Add a TODO around proper zombienet cleanup --------- Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: Paolo La Camera <paolo@parity.io> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

kianenigma added 2 commits April 23, 2025 09:58

lazy deletion infra for epmb

a68fd8f

Add OnRoundRotation hook

78b97db

kianenigma requested review from niklasad1 and sigurpol April 23, 2025 09:36

kianenigma requested a review from a team as a code owner April 23, 2025 09:36

kianenigma changed the title ~~Prepare election-provider-multi-block for full lazy data deletion~~ [AHM] Prepare election-provider-multi-block for full lazy data deletion Apr 23, 2025

kianenigma added the R0-no-crate-publish-required The change does not require any crates to be re-published. label Apr 23, 2025

niklasad1 approved these changes Apr 24, 2025

View reviewed changes

Ank4n assigned kianenigma Apr 24, 2025

sigurpol approved these changes Apr 24, 2025

View reviewed changes

sigurpol mentioned this pull request Apr 24, 2025

Update metadata and logic to handle storage items keyed by round index paritytech/polkadot-staking-miner#1032

Closed

Merge branch 'master' into kiz-empb-lazy-deletion-1

f985ece

kianenigma added this pull request to the merge queue Apr 25, 2025

github-merge-queue bot removed this pull request from the merge queue due to failed status checks Apr 25, 2025

Merge branch 'master' into kiz-empb-lazy-deletion-1

ee785e1

kianenigma enabled auto-merge April 25, 2025 12:07

kianenigma added this pull request to the merge queue Apr 25, 2025

Merged via the queue into master with commit 98c3154 Apr 25, 2025
239 of 249 checks passed

kianenigma deleted the kiz-empb-lazy-deletion-1 branch April 25, 2025 14:07

sigurpol mentioned this pull request Apr 25, 2025

Update multiblock metadata (tracking SDK rev 8da42e8fae) paritytech/polkadot-staking-miner#1036

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[AHM] Prepare `election-provider-multi-block` for full lazy data deletion#8304

[AHM] Prepare `election-provider-multi-block` for full lazy data deletion#8304
kianenigma merged 4 commits intomasterfrom
kiz-empb-lazy-deletion-1

kianenigma commented Apr 23, 2025 •

edited

Loading

Uh oh!

sigurpol commented Apr 24, 2025 •

edited

Loading

Uh oh!

niklasad1 commented Apr 24, 2025 •

edited

Loading

Uh oh!

kianenigma commented Apr 24, 2025

Uh oh!

sigurpol commented Apr 24, 2025

Uh oh!

sigurpol commented Apr 25, 2025

Uh oh!

kianenigma commented Apr 25, 2025

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

kianenigma commented Apr 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reflection

Uh oh!

sigurpol commented Apr 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Current behavior:

New behavior with round-indexed storage:

Uh oh!

niklasad1 commented Apr 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kianenigma commented Apr 24, 2025

Uh oh!

sigurpol commented Apr 24, 2025

Uh oh!

sigurpol commented Apr 25, 2025

Uh oh!

kianenigma commented Apr 25, 2025

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

kianenigma commented Apr 23, 2025 •

edited

Loading

sigurpol commented Apr 24, 2025 •

edited

Loading

niklasad1 commented Apr 24, 2025 •

edited

Loading